High Efficiency on Prediction of Translation Initiation Site (TIS) of RefSeq Sequences

نویسندگان

  • Cristiane Neri Nobre
  • J. Miguel Ortega
  • Antônio de Pádua Braga
چکیده

An important task in the area of gene discovery is the correct prediction of the translation initiation site (TIS). The TIS can correspond to the first AUG, but this is not always the case. This task can be modeled as a classification problem between positive (TIS) and negative patterns. Here we have used Support Vector Machine working with data processed by the class balancing method called Smote (Synthetic Minority Over-sampling Technique). Smote was used because the average imbalance has a positive/negative pattern ratio of around 1:28 for the databases used in this work. As a result we have attained accuracy, precision, sensitivity and specificity values of 99% on average.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes

Correct annotation of translation initiation site (TIS) is essential for both experiments and bioinformatics studies of prokaryotic translation initiation mechanism as well as understanding of gene regulation and gene structure. Here we describe a comprehensive database ProTISA, which collects TIS confirmed through a variety of available evidences for prokaryotic genomes, including Swiss-Prot e...

متن کامل

Using amino acid patterns to accurately predict translation initiation sites

The translation initiation site (TIS) prediction problem is about how to correctly identify TIS in mRNA, cDNA, or other types of genomic sequences. High prediction accuracy can be helpful in a better understanding of protein coding from nucleotide sequences. This is an important step in genomic analysis to determine protein coding from nucleotide sequences. In this paper, we present an in silic...

متن کامل

A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites

The identification of translation initiation sites (TISs) constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any particular gene. We have formulated a reference-free method to score the TIS annotation quality. The me...

متن کامل

Quantitative analysis of mammalian translation initiation sites by FACS-seq

An approach combining fluorescence-activated cell sorting and high-throughput DNA sequencing (FACS-seq) was employed to determine the efficiency of start codon recognition for all possible translation initiation sites (TIS) utilizing AUG start codons. Using FACS-seq, we measured translation from a genetic reporter library representing all 65,536 possible TIS sequences spanning the -6 to +5 posi...

متن کامل

Prediction of Eukaryotic Translation Initiation Sites Using Machine Learning

The computational identification of translation initiation sites (TIS) is a major component of every gene prediction system, and is thus of major importance in genome annotation projects. A large number of machine learning methods have been described to identify TIS in transcripts such as mRNA, EST and cDNA sequences. In this regard, most of the prediction methods have focused on recognizing TI...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007